On  The  Development  of  Visual  Object  Memory: 
The  Stay/Go  Decision  Problem 


Clayton  T.  Morrison 

Computer  Science 
University  of  Massachusetts 
Amherst,  MA  01003 
claytonOcs . umass . edu 


Paul  R.  Cohen 

Computer  Science 
University  of  Massachusetts 
Amherst,  MA  01003 
cohenOcs . umass . edu 


Paola  Sebastiani 

Mathematics  &  Statistics 
University  of  Massachusetts 
Amherst,  MA  01003 
sebas@math . umass . edu 


Abstract 

A  developing  memory  requires  a  mechanism  for  de¬ 
ciding  how  much  information  to  gather,  based  on  what 
is  currently  represented  in  memory.  That  is,  we  need 
to  know  when  we  have  seen  enough  to  say  we  have 
or  have  not  seen  this  before,  or  that  we  need  to  con¬ 
tinue  collecting  data.  We  present  a  novel  statistical 
approach  to  this  decision  mechanism.  This  serves  as 
the  foundation  for  a  simple  visual  object  memory.  We 
present  results  from  simulations  showing  that  the  sta¬ 
tistical  measure  can  serve  as  the  basis  of  the  stay /go 
decision  process. 

1  Introduction 

Consider  a  robot  faced  with  the  task  of  learning  to 
distinguish  objects  in  its  environment.  As  the  robot 
moves  around  its  environment  it  makes  visual  contact 
with  objects,  some  of  which  it  has  seen  before  and 
some  of  which  are  novel.  Once  it  makes  visual  con¬ 
tact,  the  robot  looks  at  an  object  from  several  angles 
before  deciding  to  move  on  in  search  of  other  objects. 
This  paper  develops  a  statistical  framework  for  the  de¬ 
cision  to  stay  and  continue  looking  at  an  object  or  go 
and  look  at  another  object.  We  call  this  the  stay/go 
problem.  This  paper  does  not  try  to  solve  the  problem 
in  an  optimal  way,  rather,  it  provides  a  simple  method 
that  keeps  the  robot  looking  at  an  object  until  it  is  rea¬ 
sonably  certain  that  additional  views  will  not  help  it 
discriminate  the  object  from  others  it  knows.  The  set 
of  views  that  have  been  observed  are  the  foundation 
of  a  visual  object  memory. 

2  Object  Representation:  Curvature 
Scale-Space 

Our  robot  is  a  Pioneer  II  with  a  Sony  pan-tilt-zoom 
camera.  Each  image  of  an  object  is  processed  by 
an  algorithm  that  generates  a  curvature  scale-space 
(CSS)  representation.  CSS  diagrams  like  the  one  in 


(a)  (b)  (c) 


Figure  1:  Curvature  Scale  Space  representation:  (a) 
Original  image  of  object,  (b)  Extracted  pixels  and  bor¬ 
der  of  tracked  object,  (c)  Curvature  Scale  Space  dia¬ 
gram. 


Figure  l.c  represent  how  the  curvature  of  each  point 
on  a  silhouette  of  the  object  changes  with  repeated 
smoothing  [1],  The  horizontal  axis  (often  denoted  by 
u)  represents  the  length  of  the  silhouette  curve,  and 
the  vertical  axis  (often  denoted  by  a)  represents  the 
degree  of  smoothing  applied  to  the  curvature.  The 
shift  from  black  to  white  along  each  horizontal  line  of 
the  diagram  indicates  a  shift  from  positive  to  negative 
change  in  the  slope  of  the  tangent  line  to  the  silhouette 
curve.  Each  peak  in  a  CSS  diagram  represents  an  “ap¬ 
pendage”  in  the  original  silhouette  (Fig.  l.b).  Suppose 
one  has  the  CSS  diagram  S  of  a  new  image  and  wishes 
to  identify  the  object  in  the  image  by  comparing  S  to 
other  CSS  diagrams  in  a  corpus  and  retrieving  their 
associated  images.  It  has  been  shown  [2,  3]  that  the 
information  in  CSS  diagrams  is  usually  sufficient  to 
retrieve  images  that  closely  match  the  outline  shape 
of  the  new  image,  despite  differences  in  scale  and  ori¬ 
entation. 

CSS  diagrams  can  be  represented  as  points  in  a 
high-dimensional  space  as  follows:  Each  peak  in  a  di¬ 
agram  has  a  horizontal  and  vertical  coordinate,  so  a 
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diagram  with  N  peaks  can  be  represented  as  a  single 
point  in  a  space  of  2 n  dimensions.  However,  not  all 
CSS  diagrams  have  the  same  number  of  peaks,  and  the 
location  of  peaks  depends  on  the  rotation  of  the  object 
in  the  image.  We  fix  the  latter  problem  by  selecting 
the  tallest  peak  in  the  CSS  diagram  and  shifting  it 
to  the  left-hand  border.  Since  the  horizontal  axis  of 
the  CSS  diagram  represents  the  length  of  the  silhou¬ 
ette  border,  the  left  and  right  edges  of  the  horizontal 
axis  are  actually  adjacent  -  i.e.,  the  horizontal  axis 
is  circular.  Any  peaks  to  the  left  of  the  tallest  peak 
before  shifting  “wrap  around”  to  the  right  side  during 
shifting  (Figure  2. a  and  2.b).  Once  a  CSS  diagram 
has  been  shifted,  the  next  task  is  to  tackle  the  prob¬ 
lem  of  variations  in  the  numbers  of  CSS  peaks.  We 
choose  k,  the  number  of  peaks  we  will  represent,  k 
may  be  greater  or  smaller  than  the  actual  number  of 
peaks  in  any  given  CSS  diagram.  The  multidimen¬ 
sional  point  representation  of  the  CSS  diagram,  called 
the  canonical  representation ,  is  then  2fc-l— dimensions 
(— 1  because  the  horizontal  coordinate  of  the  largest 
peak  after  shifting  is  always  0,  as  is  the  case  for  coordi¬ 
nate  ui  of  peak  1  in  Figure  2).  Finally,  the  horizontal 
position  of  each  peak  fills  the  first  k  —  1  dimensions  of 
the  canonical  representation,  and  the  remaining  k  di¬ 
mensions  are  filled  by  the  vertical  values  of  each  point. 
These  are  filled  in  order  of  the  largest  to  smallest  rep¬ 
resented  CSS  diagram  peaks.  Figure  2  shows  an  exam¬ 
ple  of  this  canonicalization;  note  that  the  numbering 
of  the  peaks  follows  their  order  by  height,  which  in 
turn  is  reflected  in  the  order  in  which  the  values  ap¬ 
pear  in  the  canonical  representation.  Had  there  been 
six  CSS  points,  the  6th  (and  therefore  smallest)  peak 
would  not  be  represented  in  the  canonicalization. 

In  summary,  images  of  objects  are  first  repre¬ 
sented  as  CSS  diagrams  and  then  as  points  in  2k  —  1- 
dimensional  space  (representing  the  k  highest  peaks  of 
the  diagram). 

3  Object  Labels  and  the  Stay /Go  De¬ 
cision 

As  soon  as  the  robot  makes  visual  contact  with 
some  object  it  generates  a  unique  label;  it  attaches  the 
label  to  the  CSS  diagrams  of  every  image  it  collects 
while  remaining  in  continuous  visual  contact.  If  the 
robot  observed  the  same  object  again,  at  a  later  time, 
the  images  in  this  set  get  a  new  label.  In  general  there 
are  several  sets  of  uniquely-labeled  CSS  diagrams  for 
each  object.  When  the  robot  collects  25  images  of  a 
dog,  then  30  of  a  man,  then  another  20  of  the  original 
dog,  its  memory  contains  two  sets  of  CSS  diagrams 
of  the  dog,  each  with  a  unique  label.  How  big  should 
the  second  set  of  dog  images  be?  If  additional  images 


(a)  (b) 


U 


(c)  (u2,  U3,  U4,  0,  Gj,  G2,  G3,  G4,  o) 

Figure  2:  Deriving  a  canonical  representation  of  a  CSS 
diagram:  (a)  Initial  CSS  diagram,  (b)  Shifted  CSS  di¬ 
agram,  with  example  (u,  a)  coordinate  for  peak  2,  (c) 
canonical  representation  of  k  =  5  possible  CSS  points; 
with  only  four  actual  peaks,  the  remaining  dimensions 
of  the  canonical  representation  have  0- values. 


cannot  help  the  robot  decide  either  that  the  object  is 
different  from  the  man  or  that  it  is  the  same  as  the 
first  dog,  then  the  robot  should  stop  collecting  images. 

4  A  Model  of  the  Stay /Go  Decision 

In  general  the  stay/go  decision  concerns  the  value 
of  additional  data:  One  should  stay  and  collect  more 
data  as  long  as  they  are  valuable,  otherwise  one  should 
go.  The  value  of  data  depends  on  what  one  intends  to 
do  with  the  data;  specifically,  the  value  of  additional 
images  of  an  object  depends  on  what  the  robot  will 
do  with  the  images.  Yet  we  do  not  want  the  robot’s 
stay/go  strategy  to  depend  on  the  specifics  of  a  par¬ 
ticular  task.  We  want  the  value  of  images  to  depend 
on  those  the  robot  already  has  in  memory  and  on  the 
new  images  it  collects.  We  want  the  existing  and  new 
images  to  be  put  to  work  in  an  extremely  general  task 
with  a  measure  of  merit  <j>,  and  we  want  <j>  to  reach  a 
maximum  after  the  robot  collects  a  finite  number  of 
new  images.  When  (f>  reaches  its  maximum  (or  mini¬ 
mum)  value,  the  sample  of  new  images  is  as  big  as  it 
should  be  —  making  it  larger  will  not  increase  (or  de¬ 
crease)  4>,  the  robot’s  score  on  its  task.  At  this  point, 
the  robot  should  stop  collecting  images  of  the  current 
object,  because  these  images  will  not  improve  its  score. 

Let  us  develop  a  version  of  <f>  for  a  simple  prob¬ 
lem  and  then  show  how  to  extend  it  to  the  robot’s 
stay/go  decision.  Suppose  one  already  has  a  sample 
of  a  random  variable  xa  with  mean  xa,  variance  .sjj, 
and  sample  size  na •  Sample  a  is  analogous  to  a  set  of 
identically-labelled  CSS  diagrams  in  memory.  Now, 


one  starts  to  collect  new  data  and  accumulate  it  in 
sample  b.  This  sample  is  analogous  to  what  we  call 
new  images,  above.  How  many  data  should  one  col¬ 
lect,  that  is,  how  big  should  n b  become?  Can  we  for¬ 
mulate  a  task  that,  when  performed  on  samples  a  and 
b  and  evaluated  with  function  </>,  has  a  maximum  value 
of  <f>  at  some  value  n 5?  Here  is  one:  We  play  a  game 
that  has  two  conditions,  one  in  which  samples  a  and 
b  are  treated  as  different,  and  one  in  which  they  are 
appended  into  a  single  sample,  and  we  ask  how  much 
better  can  we  play  the  game  in  the  former  condition 
than  the  latter.  In  the  first  condition,  an  element  x  is 
drawn  at  random  from  sample  a  or  sample  b,  you  are 
told  which  sample  it  is  from,  and  invited  to  guess  its 
value.  You  are  assessed  an  error,  which  is  the  squared 
difference  between  your  guess  and  x.  To  minimize 
this  error  your  guess  should  be  xa  (or  xb,  depending 
on  which  sample  x  comes  from) .  Suppose  this  game  is 
repeated  na  times  with  elements  of  sample  a  and  nb 
times  with  elements  of  sample  b.  Then  the  expected 
total  error  assessed  against  you  is: 

na  rib 

^  'Xxi  ~ Xa )  a  'y  'Xxj — xb)  (i) 

i=  1  3= 1 

We  denote  these  sums  SSa  and  SSb-  Note  that  these 
sums  of  squares  are  related  to  the  variances  si  and  s2 
by  SSa  =  nas2a  and  SSb  =  nbsl. 

In  a  second  condition  of  the  game,  samples  a  and 
b  are  first  appended  into  a  single  sample  g  with  mean 
xg.  The  guessing  is  repeated  ng  =  na  +  rib  times,  with 
expected  error: 


SSg  is  dominated  by  SSa,  and,  conversely,  when  rib 
is  much  larger  than  na,  SSg  is  dominated  by  SSb- 
As  SSg  becomes  dominated  by  either  SSa  or  SSb  the 
numerator  of  Equation  3  approaches  zero.  In  fact, 
when  s2a  =  s2,  the  maximum  value  of  <f>  occurs  when 
na  =  rib-  More  generally,  when  the  variances  of  a  and 
b  may  be  unequal,  we  can  find  the  value  of  rib  at  which 
cj>  is  maximum  in  the  standard  way  by  differentiating 
Equation  3  and  finding  the  value  of  nb  at  which  that 
function  is  zero.  Equation  3  can  be  simplified  by  using 
the  decomposition 

SSg  =  SSa  +  SSb  +  (nant(^~^)2)  (4) 

rig 

so  that 

,  _  _ Q nanb/ng)(xa  -  xb)2 _ 

SSa  +  SSb  +  (nanb/ng)(xa  -  xb)2 

To  maximize  this  function  in  nb  is  difficult  because 
SSb  and  %b  are  functions  of  rib,  but  one  can  make  some 
approximations.  As  rib  increases,  (xa  —  xb)2  becomes 
approximately  constant,  and  provided  nb  and  na  are 
not  too  small  we  can  approximate  both  by  the  sample 
variances:  SSa  =  riasl  and  SSb  =  ribs2.  In  this  way, 
the  value  of  nb  that  maximizes  </>  is  the  solution  of  the 
equation 

22  22 

nbsb  =  nasa  (6) 

When  the  variances  are  the  same,  is  maximized 
when  na  =  nb  and  in  general  when 


na  rib 

SSg  =  ^{Xi~Xg)2 +  Y^(Xj  -Xg)2  (2) 

i=l  3= 1 

Clearly,  if  a  and  b  are  very  similar  samples  in  their 
means  and  variances,  then  the  value  of  SSa  +  SSb 
will  be  very  similar  to  SSg;  conversely,  if  a  and  b  are 
different,  then  (SSa  +  SSb)  <  SSg.  The  difference 
SSg  —  (SSa  +  SSb )  can  be  interpreted  as  a  reduction 
in  errors  obtained  by  making  a  distinction  between 
samples  a  and  b ,  as  opposed  to  treating  them  as  one 
undifferentiated  sample  g.  When  we  express  this  re¬ 
duction  in  errors  as  a  proportion  of  SSg,  we  obtain 
the  desired  function  (j>: 

^  _SSg  -  (SSa  +  SSb) 

0  “  SSg  {6} 

One  can  see  that  <f>  has  a  maximum  and  that  it  de¬ 
pends  on  na  and  nb  (for  now,  suppose  the  variances 
si  and  si  are  equal,  we  will  generalize  this  in  a  mo¬ 
ment):  Clearly,  when  nb  is  much  smaller  than  na, 


nb  =  na-%  (7) 

To  recap,  we  drew  an  analogy  between  the  stay/go 
decision  and  the  following  statistical  question:  Given  a 
sample  a  of  size  na  with  variance  s2a,  how  many  data 
should  we  collect  into  a  sample  b  for  a  performance 
metric  </>  to  reach  its  maximum  value?  The  metric  in 
this  case  is  the  reduction  in  errors  that  one  achieves 
by  treating  samples  a  and  b  as  different  (Equation  3). 
More  specifically,  if  one  repeatedly  guesses  the  value  of 
a  datum  x,  then  <j>  is  the  reduction  in  errors  (the  differ¬ 
ence  between  guess,;  and  Xi)  due  to  knowing  whether 
x  came  from  sample  a  or  b.  Said  differently,  <f>  is  the 
value  of  treating  a  and  &  as  if  they  came  from  different 
populations.  If,  in  fact,  a  and  b  are  drawn  from  dif¬ 
ferent  populations,  then  cf>  will  have  a  nice,  clear  peak 
when  rib  =  na(sl/sl),  otherwise  (j>  will  hover  around 
zero  for  all  values  of  rib- 

Mapping  this  analogy  back  to  the  robot’s  stay/go 
decision,  the  robot  should  collect  a  sample  of  images  b 


as  long  as  there  is  value  in  treating  the  images  in  a  and 
b  as  if  they  are  images  of  different  objects.  When  they 
are,  in  fact,  images  of  different  objects,  then  0  will 
have  a  well-defined  peak  at  nb  =  na(s2/s2),  otherwise 
0  will  hover  around  zero  for  all  values  of  rib- 

5  The  Stay/Go  Decision  Methods 

It  is  easy  to  implement  this  stay/go  strategy  when 
images  of  objects  are  represented  as  CSS  diagrams. 
Recall  that  the  robot’s  memory  contains  sets  of  CSS 
diagrams  and  all  the  diagrams  in  a  set  represent  im¬ 
ages  of  the  same  object.  Recall,  too,  that  each  CSS 
diagram  is  transformed  to  a  canonical  representation 
point  in  2fc-l— dimensional  space.  For  a  given  set,  a, 
we  can  calculate  both  its  centroid  xa  and  SSa,  the  sum 
of  squared  distances  between  each  element  of  a  and  xa- 
This  is  all  the  information  we  need  to  calculate  0  as 
in  Equation  3. 

There  are  two  general  approaches  to  implementing 
the  stay/go  decision  process  based  on  0,  each  of  which 
has  two  variations  (for  a  total  of  four  methods).  The 
analytic  approach  uses  Equation  7,  the  anticipated 
number  of  views  needed  to  achieve  a  maximal  value 
for  0.  In  Equation  7,  sample  a  corresponds  to  the  set 
of  CSS  view  representations  in  memory.  The  size,  na, 
is  already  known,  as  well  as  s2.  The  difference  between 
the  two  analytic  approaches  is  in  how  s2  is  calculated, 
where  sample  b  is  the  set  of  new  views  of  the  current 
object  the  robot  is  observing.  The  Analytic- Complete 
method  calculates  s2  based  on  the  complete  set  of 
views  for  the  currently  experienced  object.  Clearly 
this  method  can  only  be  implemented  in  simulation, 
as  it  requires  the  set  of  possible  views  for  a  particular 
object  to  be  known  a  priori.  For  the  simulations  be¬ 
low,  this  method  provides  a  baseline  of  performance. 
The  Analytic-Estimate  method  estimates  s2  based  on 
the  current  sample  of  views.  The  estimate  for  the  cur¬ 
rent  object  being  observed  will  gradually  change  as 
new  views  are  acquired. 

The  other  group  of  stay/go  approaches  is  empiri¬ 
cal  because  the  decision  to  go  is  based  on  the  current 
shape  of  the  developing  0  curve  as  views  are  collected. 
As  noted,  0  will  either  grow  to  a  peak  and  then  de¬ 
crease  as  more  views  are  experienced,  or  0  will  hover 
around  zero  (Figure  3).  In  the  empirical  method,  a 
peak- finding  test  sweeps  a  window  across  the  0  curve, 
from  left  to  right,  looking  for  the  first  instance  in  which 
the  average  0  values  to  the  left  and  right  of  the  window 
are  lower  than  the  average  of  the  window  center.  This 
will  find  a  peak  if  one  exists.  Such  a  peak  is  found  for 
0-curve  A1  in  Figure  3  (several  peaks  are  possible  for 
Al,  but  the  decision  to  go  would  be  based  on  the  first 
that  is  found).  For  0-curve  A2,  however,  there  is  not 


Figure  3:  Example  of  curves  generated  by  plotting 
values  of  0  as  views  are  accumulated.  The  two  curves 
represent  0  values  that  result  from  comparing  the  sets 
of  views  for  objects  Al  and  A2  (both  in  memory)  to 
the  current  object. 


enough  of  a  peak  present  for  the  window  test.  Instead, 
an  additional  test  must  be  defined.  The  choice  of  test 
makes  the  difference  between  the  two  variations  in  the 
empirical  method.  In  both  variations,  0  is  regressed  on 
nb-  The  slope  of  the  regression  line  is  expected  to  be 
positive  when  0  has  a  peak  and  zero  otherwise.  The 
Empirical- Threshold  method  calls  the  regression  line 
flat  (i.e.,  0  does  not  peak)  when  its  slope  is  less  than 
some  threshold  near  zero.  The  Empirical- Confidence 
method,  on  the  other  hand,  calls  the  regression  line 
flat  if  a  confidence  interval  for  the  slope  around  the 
line  contains  zero. 

In  sum,  we  have  four  stay/go  decision  methods: 
Analytic-Complete,  Analytic-Estimate,  Empirical- 
Threshold,  Empirical-Confidence.  We  now  discuss 
their  performance  in  a  simulator  using  actual  object 
images. 

6  The  Stay /Go  Simulator 

Recall  that  a  CSS  diagram  represents  curvature  fea¬ 
tures  of  an  image.  We  constructed  five  3-dimensional 
objects  with  a  variety  of  different  surface  features,  al¬ 
lowing  for  both  similarity  and  variance  among  CSS 
diagrams  of  images  taken  from  different  perspectives 
of  the  objects.  The  five  object-shapes  resemble  a  thin 
dog,  a  four-legged  arch,  a  dog  with  two  heads,  a  man, 
and  a  four-legged  object  with  a  round  body  (a  “ro¬ 
tund”  dog).  We  then  collected  7  sets  of  images  taken 
from  a  variety  of  perspectives  of  the  objects,  all  from 


the  level  of  the  Pioneer  II  camera  angle.  Three  of  the 
image  sets  were  taken  from  the  thin  dog;  one  of  these 
consisted  of  a  complete  rotation  around  the  dog,  the 
other  two  only  half  rotations  around  either  side.  Each 
set  contained  a  mean  of  34  images. 

A  simulator  was  constructed  to  test  the  stay/go  de¬ 
cision  methods.  At  the  beginning  of  each  epoch ,  one 
of  the  7  image  sets  is  selected  at  random,  and  the 
simulator  is  presented  with  views  randomly  sampled 
from  that  set.  As  described  in  Section  3,  all  of  the 
views  from  the  selected  set  are  assigned  a  unique  la¬ 
bel  chosen  at  the  beginning  of  the  epoch.  At  each 
view  presentation,  the  simulator  decides  whether  to 
stay  and  see  another  view  from  that  set,  or  go,  ter¬ 
minating  that  epoch.  The  stay/go  decision  is  calcu¬ 
lated  by  comparing  the  current  set  of  views  against 
each  set  of  previously  experienced  object  views  stored 
in  memory.  The  decision  to  go  cannot  be  made  un¬ 
til  the  </>  curve  for  each  object  in  memory  passes  the 
decision  method’s  test.  After  the  completion  of  each 
epoch,  the  views  sampled  during  the  epoch  are  added 
to  memory.  For  the  experiments  we  conducted,  each 
of  which  consisted  of  100  epochs,  the  simulator’s  mem¬ 
ory  was  seeded  with  the  complete  set  of  unique  views 
from  one  of  the  7  image  sets. 

7  Results 

We  tested  the  simulator  with  each  of  the  four 
stay/go  methods.  Figures  4-6  plot  the  number  of  views 
before  going  for  each  epoch.  For  clarity,  we  use  the 
Analytic-Estimate  curve  as  a  baseline  to  compare  with 
each  of  the  other  methods’  curves.  Figure  4  shows  that 
the  Analytic-Complete  and  Analytic-Estimate  meth¬ 
ods  are  very  similar,  with  the  Analytic-Complete  in¬ 
creasing  at  a  slightly  faster  rate.  To  understand  why 
both  of  these  curves  increase  linearly  with  each  epoch, 
observe  that  the  size  of  the  set  of  elements  for  the  ob¬ 
ject  in  memory,  na,  appears  in  the  calculation  of  Equa¬ 
tion  7.  That  is,  for  each  comparison,  the  number  of 
views  in  a  set  from  memory  directly  influences  the  go 
decision.  However,  the  decision  to  go  cannot  be  made 
until  the  comparison  with  each  object  in  memory  sat¬ 
isfies  the  go  criterion.  This  means  that  the  required 
number  of  views  will  be  influenced  by  the  largest  set  of 
views  in  memory.  This  is  the  key  shortcoming  of  the 
analytic  methods:  as  long  as  the  number  of  objects  in 
memory  keep  accruing,  the  mean  number  of  views  will 
increase. 

To  avoid  this  problem,  we  turn  to  the  empirical 
methods,  which  depend  not  on  the  size  of  the  sam¬ 
ples  but  on  the  behavior  of  cf>.  The  naive  Empirical- 
Threshold  method  exemplifies  a  first-pass  solution  to 
this  problem.  Here,  the  threshold  was  set  to  slightly 


Figure  4:  Comparison  of  Analytic-Estimate  and 
Analytic-Complete . 


above  zero  (0.015),  in  order  to  catch  those  $  curves 
with  slopes  close  to  zero,  but  to  avoid  catching  $ 
curves  that  may  eventually  have  peaks.  Figure  5  shows 
that  this  threshold  is  not  desirable.  At  least  for  runs  of 
under  100  epochs,  the  analytic  methods  perform  much 
better.  Empirical-Threshold  does,  however,  have  one 
positive  property:  despite  the  large  variance  in  num¬ 
ber  of  views  required  from  one  epoch  to  the  next,  the 
overall  number  of  views  required  does  not  increase  at 
the  rate  the  analytic  methods  do.  To  see  this  more 
clearly,  we  have  treated  each  method’s  results  as  a 
scatterplot  and  calculated  linear  regression  lines  for 
each.  These  are  summarized  in  Table  1,  which  include 
the  overall  mean  number  of  views  per  epoch,  the  stan¬ 
dard  deviation  (sxl.)  of  each  mean,  and  the  linear  re¬ 
gression  line  slope,  along  with  that  slope’s  confidence 
interval.  The  Empirical-Threshold  slope  is  0.34,  which 
is  significantly  less  than  the  two  analytic  methods. 

The  last  method,  Empirical-Confidence,  provides  a 
much  better  approach  to  determining  the  slope  of  4> 
curves  that  are  close  to  flat.  In  this  method,  the  con¬ 
fidence  interval  of  the  linear-regression  line  slope  of 
the  4>  curve  is  also  calculated,  and  if  a  slope  of  0  falls 
within  the  interval,  then  the  4>  curve  is  considered  es¬ 
sentially  flat.  Figure  6  plots  the  Empirical-Confidence 
results.  Not  only  does  it  maintain  a  slope  that  is  close 
to  horizontal,  the  number  of  views  required  for  each 
epoch  is  significantly  less  than  any  of  the  other  meth¬ 
ods.  Of  the  four  methods,  Empirical-Confidence  does 
the  best  job  of  differentiating  empirically  curves  that 
have  peaks  from  those  that  are  flat. 


Figure  5:  Comparison  of  Analytic-Estimate  and 
Empirical-Threshold . 


Figure  6:  Comparison  of  Analytic-Estimate  and 
Empirical-Confidence . 


method 

mean 

s.d. 

regression 

slope 

c.i. 

Analytic-Comp 

73.90 

24.64 

0.83 

0.0040 

Analytic-Est 

61.93 

18.07 

0.60 

0.0035 

Empirical-Thresh 

69.17 

22.97 

0.34 

0.0167 

Empirical-Conf 

38.32 

12.34 

0.15 

0.0093 

8  Summary  and  Conclusions 

The  Empirical-Confidence  decision  method  has  two 
interesting  properties:  (1)  even  early  on  it  tends  to  re¬ 
quire  fewer  views  than  the  analytic  methods  do,  and, 
more  importantly,  (2)  the  number  of  views  required 
within  each  epoch  remains  relatively  stable  as  the 
number  of  objects  represented  in  memory  increases. 
This  makes  the  Empirical-Confidence  method,  based 
on  the  behavior  of  the  4>  statistic  as  views  are  accrued, 
an  attractive  approach  to  a  general  stay/go  solution. 

Up  to  this  point,  the  sets  of  views  experienced  for 
each  epoch  have  simply  been  stored  in  memory  as  in¬ 
dividual  sets.  But  many  of  these  sets  are  actually 
sampled  from  the  same  object.  The  next  step  is  to 
investigate  methods  for  merging  sets  of  views  that  are 
similar,  and  to  evaluate  how  such  merging  interacts 
with  the  stay/go  criterion  over  time.  The  goal  is  a 
simple  visual  object  memory  that  maintains  classes  in 
memory  that  accurately  represent  classes  of  objects  in 
the  environment.  Such  a  memory  will  be  a  significant 
step  towards  an  autonomously  developing  agent. 
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